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Abstract 

We  address  the  problem  of  testing  complex  reactive  control  systems 
and  validating  the  effectiveness  of  multi-agent  controllers.  Testing  and 
validation  involve  searching  for  conditions  that  lead  to  system  failure  by 
exploring  all  adversarial  inputs  and  disturbances  for  errant  trajectories. 
This  problem  of  testing  is  related  to  motion  planning.  In  both  cases,  there 
is  a  goal  or  specification  set  consisting  of  a  set  of  points  in  state  space  that 
is  of  interest,  either  for  finding  a  plan,  demonstrating  failure  or  for  valida¬ 
tion.  Unlike  motion  planning  problems,  the  problem  of  testing  generally 
involves  systems  that  are  not  controllable  with  respect  to  disturbances 
or  adversarial  inputs  and  therefore,  the  reachable  set  of  states  is  a  small 
subset  of  the  entire  state  space.  We  choose  to  apply  sampling-based  al¬ 
gorithms  to  the  testing  and  validation  problem.  Our  work  is  based  on 
the  Rapidly-exploring  Random  Trees  (RRT)  algorithm.  First  we  analyse 
some  of  the  factors  that  govern  the  exploration  rate  of  the  RRT  algo¬ 
rithm.  The  analysis  serves  to  motivate  our  enhancements.  Then  we  pro¬ 
pose  three  modifications  to  the  original  RRT  algorithm,  suited  for  use  on 
uncontrollable  systems.  First,  we  introduce  a  new  distance  function  which 
incorporates  information  about  the  system’s  dynamics  to  select  nodes  for 
extension.  Second,  we  introduce  a  weighting  to  penalize  nodes  which  are 
repeatedly  selected  but  fail  to  extend.  Third,  we  propose  a  scheme  for 
adaptively  modifying  the  sampling  probability  distribution,  based  on  tree 
growth.  We  demonstrate  the  application  of  the  algorithm  via  several  ex¬ 
amples  and  provide  computational  statistics  to  illustrate  the  effect  of  each 
modification.  The  final  algorithm  is  demonstrated  on  a  25  state  exam¬ 
ple  and  results  in  nearly  an  order  of  magnitude  reduction  in  computation 
time  when  compared  with  the  traditional  RRT.  Our  algorithms  are  also 
applicable  to  motion  planning  for  systems  that  are  not  small  time  locally 
controllable. 
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1  Introduction 


As  the  use  of  logic-based  or  reactive  control  laws  grows  in  both  robotics  and 
other  fields,  so  does  the  need  for  automated  design  and  analysis  tools.  The  focus 
to  date  in  the  automated  safety  verification  literature  has  been  on  the  solution  of 
the  reachability  problem,  initially  through  symbolic  methods  (e.g.,  [1,  38])  and 
later  through  numerical  techniques  (e.g.,  [45,  18]).  However,  the  class  of  systems 
for  which  the  reachability  problem  is  tractable  is  quite  limited  in  both  expres¬ 
siveness  and  dimensionality.  An  alternative  approach  to  exhaustively  proving 
safety  is  to  simply  search  for  a  single  counter  example  -  a  series  of  inputs,  distur¬ 
bances  or  parameters  that  causes  a  system  to  fail.  We  term  this  semi-decision 
approach  the  Testing  Problem. 

Inspired  by  the  connections  between  the  Testing  Problem  for  complex  con¬ 
trol  systems  and  the  Motion  Planning  problem,  we  have  recently  applied  the 
Rapidly-exploring  Random  Tree  (RRT)  algorithm  to  the  Testing  Problem  [21,  4] 
with  considerable  success.  The  RRT  algorithm  is  an  incremental,  randomized 
search  algorithm  which  explores  state  space  fast  and  uniformly.  However,  the 
Motion  Planning  and  Testing  problems  are  different.  Perhaps  the  most  sig¬ 
nificant  difference  between  the  two  lies  in  the  nature  of  the  system  dynamics 
in  each  case.  Robotic  systems  are  almost  always  controllable  (by  design),  so 
the  reachable  space  is  often  the  entire  free  space.  With  the  exception  of  any 
workspace  obstacles,  whose  configurations  are  known  in  advance,  the  tree  can 
be  expected  to  extend  to  fill  the  entire  state  space.  On  the  other  hand,  when 
we  test  complex  control  systems,  it  is  frequently  with  respect  to  disturbances  or 
adversarial  inputs.  These  systems  are  frequently  not  controllable  with  respect 
to  disturbances  or  adversarial  inputs  —  in  fact,  the  reachable  set  is  usually  a 
tiny  fraction  of  the  entire  state  space. 

In  such  systems,  (1)  the  lack  of  an  obvious  metric  to  estimate  connection 
time,  (2)  the  natural  Voronoi  bias  of  the  RRT  algorithm  and  (3)  the  tradi¬ 
tional  uniform  sampling  strategy,  lead  to  a  slow  exploration  rate.  With  the 
goal  of  increasing  the  exploration  rate  for  complex  dynamic  systems,  we  pro¬ 
pose  three  modifications  to  the  original  RRT  algorithm.  First,  we  develop  a 
new  distance  function  which  encodes  local  information  about  the  system’s  dy¬ 
namic  constraints  with  a  first  order  approximation  (improved  metric).  Second, 
because  the  reachable  state  space  is  generally  a  small  fraction  of  the  total  state 
space,  we  introduce  a  weighting  factor  which  penalizes  the  repeated  extension 
of  boundary  nodes  (mitigate  Voronoi  bias).  Finally,  we  propose  a  scheme  for 
adaptively  modifying  the  sampling  probability  distribution  between  the  tradi¬ 
tional  uniform  distribution  and  heavily  biased  toward  the  specification  set  based 
on  tree  growth  (adaptive  sampling). 

The  paper  is  organized  as  follows.  In  Section  2.1  we  formally  define  the 
testing  problem.  Section  2.2  reviews  the  original  RRT  algorithm  and  the  most 
relevant  literature.  Section  3  defines  and  analyzes  the  factors  affecting  the 
exploration  rate.  In  light  of  this  analysis,  Section  4  examines  three  key  features 
of  the  traditional  RRT  algorithm  which  are  troublesome  for  testing  problems; 
proposes  methods  to  remedy  them  and  presents  simple  illustrative  examples, 
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complete  with  comparative  computational  statistics.  A  new  algorithm  unifying 
the  enhancements  is  presented  in  Section  5.1.  The  algorithm  is  used  to  solve  a 
multi-agent  pursuit-evasion  problem  and  performance  statistics  are  discussed  in 
Section  5.2.  Concluding  remarks  follow  in  Section  6. 


2  Background  and  Related  Work 

2.1  Problem  Statement 

Definition  2.1  We  define  a  Finite  Time  Control  System  as  a  tuple  C  = 
(X,U,T,  Init,  f)  where 

•  X  C  R"  is  a  set  of  free  state  variables; 

•  U  C  Rm  is  a  compact  set  of  input  values; 

•  T=  [t0,tf]  Cl  is  a  compact  time  interval  the  system  evolves  over; 

•  Init  C  X  is  a  compact  set  of  possible  initial  conditions;  and 

•  /  :  X  x  U  — >  1"  is  a  vector  field  which  prescribes  the  time  derivative  of 
the  state  variables. 

We  are  generally  interested  in  systems  with  collections  of  rigid  bodies  with  very 
complicated  dynamics,  especially  high-dimensional  continuous  systems  or  hy¬ 
brid  (discrete/continuous)  and  switched  systems  where  /  may  be  a  non-smooth 
function  of  x.  We  do  not  impose  any  structure  on  the  nature  of  the  dynamics 
(except  assuming  that  solutions  exist  in  the  sense  of  Filippov  [48],  which  permits 
sliding  modes).  We  use  the  term  “input”  in  the  most  general  sense  in  that  it 
can  include  yet  unspecified  feedback  control  inputs,  human-in-the-loop  inputs, 
and  disturbances. 

Problem  2.2  Testing  Problem:  Given  a  tuple  (C,x°,S),  where 

•  C  =  (X,  U,  T,  Init,  f)  is  a  finite  time  control  system, 

•  x°  €  Init,  and 

•  S  is  a  specification  set, 

the  goal  is  to  determine  an  open  loop  control  law  U  :  T  — >  U  such  that  3t  G  T 
for  which  x(t)  €  S. 

In  other  words,  the  goal  is  to  determine  a  counter-example  -  an  input  sequence 
which  will  cause  the  system  to  fail  by  entering  S  -  if  one  exists.  However,  in 
order  to  make  the  problem  algorithmically  tractable,  instead  of  searching  the 
set  of  all  possible  functions  U  :  T  — >  U,  the  search  must  be  restricted  to  some 
subset  of  functions  with  finite  dimensional  parameterization  [12]. 

We  make  three  assumptions  that  simplify  the  presentation  of  the  key  ideas 
in  the  paper  without  restricting  the  scope  in  a  significant  way.  First,  we  will 
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assume  there  is  a  simple  test  that  can  be  applied  to  determine  if  a  point  in  R"  is 
a  member  of  X  Cl".  Second,  assume  the  specification  set  S  can  be  defined  as 
the  sub-level  set  of  some  function  S  =  {x\x  G  X,  s(x)  <  0}.  Finally,  we  restrict 
our  search  over  U  to  piecewise  constant  functions  of  time  with  k  segments,  each 
of  time  duration  At.  Thus,  instead  of  the  continuous  map  IA.  we  consider  the 
search  over  U  :  T  — >  U,  as  the  search  for  a  k-vector  of  parameters.  With  ul  £  U 

u  =  [u1,  u2, . . . ,  uk]T 

so  the  input  u(t)  is  given  by 

u(t)  =  ul  GU  if  t°  +  (i  —  1)A t  <t<t°  +  ( i)At 
for  i  =  1, . . . ,  k. 

2.2  Related  Work 

While  there  has  been  a  great  deal  of  work  on  this  problem,  complete  meth¬ 
ods  for  robot  motion  planning  have  prohibitive  complexity.  It  was  shown  that 
the  generalized  mover’s  problem  is  PSPACE-hard  [47].  Explicit  construction  of 
configuration  space  is  computationally  impractical.  This  has  motivated  many 
researchers  to  focus  on  sampling-based  randomized  algorithms  that  can  solve 
many  challenging  high-dimensional  problems  efficiently  at  the  expense  of  com¬ 
pleteness. 

In  the  Probabilistic  Roadmap  (PRM)  method  and  its  variants  [29,  31,  2,  32, 
30,  10],  a  network  of  paths  called  a  roadmap  is  constructed  in  the  configuration 
space  by  generating  random  configurations  and  attempting  to  connect  pairs  of 
nearby  configurations  with  a  local  planner  (preprocessing  phase).  After  con¬ 
struction  of  the  roadmap,  the  user-specified  initial  configuration  and  the  goal 
configuration  are  connected  to  the  roadmap  and  a  solution  path  is  obtained 
by  a  graph-search  algorithm  (query  phase).  Modifications  to  the  sampling  of 
random  configurations  have  been  suggested  such  as  visibility-based  PRM  [49], 
Gaussian  sampling  [7],  the  bridge  test  [26]  and  the  medial  axis  method  [25,  52]. 
Single-query  PRM,  called  Lazy  PRM  is  proposed  in  [6].  In  Lazy  PRM,  the  col¬ 
lision  check  is  conducted  only  in  the  query  phase  and  the  roadmap  is  updated 
as  the  planner  searches  a  solution  path  to  minimize  the  number  of  collision 
checks.  Recently,  the  influence  of  the  sampling  measure  and  sampling  source  on 
the  PRM  planner’s  performance  was  presented  in  [28]  showing  the  source  has 
small  impact  on  a  planner’s  performance  as  compared  to  the  measure.  If  we 
can  design  a  local  planner  that  connects  pairs  of  configuration  satisfying  corre¬ 
sponding  constraints,  a  feasible  path  is  obtained  by  simple  concatenation  of  the 
successive  path  segments.  Therefore,  the  PRM  method  can  be  easily  applied  to 
virtually  any  type  of  holonomic  robots.  However,  the  connection  problem  can 
be  as  difficult  as  designing  a  nonlinear  controller,  particularly  for  complicated 
nonholonomic  and  dynamic  systems. 

The  problem  of  finding  a  suitable  trajectory  and  control  inputs  to  drive 
a  robot  from  an  initial  state  to  a  goal  state  while  satisfying  physically-based 
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dynamic  constraints  has  been  an  active  research  area  in  many  fields.  The  de¬ 
coupled  approach,  in  which  one  solves  a  basic  path  planning  problem  followed  by 
finding  a  trajectory  and  controller  that  satisfies  the  dynamics,  has  been  applied 
successfully  to  a  variety  of  problems.  However,  decoupled  approaches  often  fail 
to  find  a  feasible  solution  due  to  the  system  dynamics  or  constraints  on  control 
inputs.  It  is  often  the  case  that  kinematic  and  dynamic  constraints  have  to 
be  taken  into  consideration  simultaneously.  This  type  of  problem  is  known  as 
kinodynamic  motion  planning  [20].  For  planning  under  differential  constraints, 
expansive  space  trees  (ESTs)  [27]  assign  a  weight  to  each  node  indicating  how 
densely  the  neighborhood  of  the  node  has  already  been  explored  and  choose  a 
node  to  extend  with  probability  inversely  proportional  to  the  weight.  PDST- 
EXPLORE  algorithm  uses  nonuniform  subdivision  of  state  space  and  sampling 
of  paths  rather  than  states  [36,  37]. 

A  Rapidly-exploring  Random  Tree  (RRT)  is  a  randomized  algorithm  that 
is  designed  for  a  broad  class  of  motion  planning  problems  [40,  41].  The  ad¬ 
vantage  of  RRT  algorithm  is  that  they  work  directly  with  the  set  of  admissible 
inputs  and  are  therefore  directly  applicable  to  systems  with  complex  dynamics. 
It  is  well  suited  to  the  problem  of  quickly  searching  high-dimensional  spaces 
that  have  both  algebraic  and  differential  constraints.  The  key  idea  is  to  bias 
the  exploration  toward  unexplored  portions  of  the  space  by  randomly  sampling 
points  in  the  state  space  and  incrementally  pulling  the  search  tree  toward  them, 
reducing  the  size  of  largest  Voronoi  regions  as  the  tree  grows.  Therefore,  the 
graph  explores  the  state  space  uniformly  and  quickly.  The  RRT  algorithm  has 
experienced  widespread  success  in  solving  a  variety  of  high  dimensional  and 
nonlinear  problems  in  motion  planning  [34,  14,  15]. 

We  base  our  approach  on  the  Rapidly-exploring  Random  Tree  (RRT)  algo¬ 
rithm.  A  very  basic  algorithm  is  given  in  Algorithms  1  and  2,  where  p  is  some 
suitable  metric  and  pdf  is  a  probability  distribution.  The  RRT  algorithm  is 
attractive  because  it  works  directly  in  the  space  of  admissible  inputs  making 
them  suitable  for  systems  with  dynamic  constraints  and  because  it  is  probabilis¬ 
tically  complete  [40].  However,  surprisingly  little  attention  has  been  directed  at 
predicting  or  measuring  the  rate  at  which  the  trees  explore  the  space. 

There  exists  much  work  on  safety  verification.  Reachability  analysis  has  been 
an  active  research  area.  Symbolic  methods  [1,  24,  38,  23]  are  found  for  very  lim¬ 
ited  classes  of  hybrid  systems.  Approximate  reachability  computation  relies 
on  numerical  methods  such  as  Hamilton- Jacobi  equation  [45,  50,  44],  flow-pipe 
approximation  [16,  17,  18],  ellipsoidal  calculus  [35,  8],  and  polyhedral  approx¬ 
imation  [19,  3].  The  barrier  certificates  method,  similar  to  Lyapunov  stability 
results,  is  proposed  in  [46].  However,  construction  of  barrier  certificates  is  gen¬ 
erally  not  easy  and  a  computational  technique  is  known  only  for  polynomial 
systems.  The  class  of  hybrid  systems  for  which  the  reachability  problem  is 
tractable  is  quite  limited  in  both  expressiveness  and  dimensionality.  The  pri¬ 
mary  reason  for  this  limitation  stems  from  the  difficulties  explicitly  representing 
and  manipulating  the  reachable  set.  Such  an  approach  overcomes  traditional 
limitations  because  it  makes  no  attempt  to  explicitly  represent  the  reachable 
set.  The  drawback  however  is  that  it  is  a  semi-decision  method  the  test  is 
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inconclusive  if  no  counter  example  is  found.  However,  for  non-linear,  high  di¬ 
mensional  systems  there  is  at  this  time  no  alternative  method  to  verify  safe 
operation. 

The  approach  of  using  the  RRT  algorithm  to  analyze  logic-based  control  sys¬ 
tems  is  recent.  In  [22]  RRT  algorithm  was  used  to  design  trajectories  of  hybrid 
systems.  The  first  published  work  using  the  RRT  algorithm  for  analyzing  hybrid 
systems  is  [9,  42].  In  a  similar  vein,  a  blimp  system  control  law  was  validated 
under  unpredictable  but  bounded  disturbances  [33] .  In  [5] ,  the  reachable  set  for 
aircraft  collision  avoidance  problem  was  obtained  and  several  extensions  of  the 
RRT  approach  were  mentioned.  We  have  applied  a  variant  of  this  method  [4] 
to  testing  hypotheses  and  establishing  properties  of  biological  networks. 


Algorithm  1  Generate  RRT:  T 
Initialize  RRT:  T.addVertex(a;0) 
while  fix  £  T  such  that  s(x)  <  0  do 
Extend(T) 

end  while 


Algorithm  2  Extend(T) 

xrand  eX(_  pdf() 

xnear  arg  mi nxj eT  p(xj ,  Xrand) 

unew  =  arg imivueij{p(xnear  +  fAt  f(x,u)dt,xrand )} 

Xnew  =  xnear  +  / (X ,  Unew (t)) dt 

T.  add  Vertex^"®*" ) 

T. addEdge (unew ,  xnear  ->•  xnew) 


There  have  been  several  enhancements  to  the  basic  RRT  algorithm.  In  [13]  a 
method  for  penalizing  the  repeated  selection  of  collision  prone  nodes  for  exten¬ 
sion  is  introduced.  In  [43]  a  node  selection  strategy  is  described  which  increases 
the  natural  Voronoi  bias  of  the  method  for  the  purposes  of  dispersion  reduc¬ 
tion.  In  [51,  53]  adaptive  RRT  methods  for  problems  with  complex  obstacles 
are  addressed.  However,  no  approach  is  able  to  reliably  reduce  the  dispersion 
(which  must  be  measured  within  the  reachable  set)  for  uncontrollable  systems 
with  dynamic  constraints.  Biasing  the  sampling  toward  regions  close  to  the 
goal  state  has  been  tried  in  [41],  [42]  and  [9]  with  some  success.  However  the 
sample  bias  factor  is  fixed  a  priori  and  it  can  lead  to  difficulties  in  non-convex 
systems  because  of  the  presence  of  local  minima.  In  [33] ,  a  metric  accounting 
for  under-actuated  dynamics  is  suggested  but  is  specific  to  the  aerial  robots  ex¬ 
ample  considered  there.  In  [21]  the  Rapidly-exploring  Random  Forest  of  Trees 
(RRFT)  algorithm  was  introduced  which  searches  over  time  invariant  parame¬ 
ters  by  planting  many  RRTs  at  a  sampling  of  parameter  values.  All  the  trees 
are  grown  simultaneously.  Individual  trees  may  be  terminated  if  they  fail  to 
grow  at  a  sufficient  rate. 
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3  Exploration  rate 

The  goal  of  the  testing  problem  is  to  find  a  sequence  of  inputs  from  x°  to  S  as 
quickly  as  possible.  While  easily  measurable  and  of  great  practical  importance, 
the  required  time  to  find  a  solution  for  a  given  system  strongly  depends  on  the 
location  of  S  and  a;0  in  X.  Alternatively,  because  a  desirable  feature  of  any 
sampling-based  algorithm  is  to  rapidly  explore  the  state  space,  a  measure  of 
exploration  rate  would  be  more  useful  and  perhaps  more  robust  as  a  performance 
measure.  In  this  section,  we  define  what  we  mean  by  “exploring”  the  state  space, 
and  analyze  the  factors  that  influence  the  rate  of  exploration.  The  analysis 
points  to  some  methods  of  improving  the  exploration  rate. 

We  would  like  to  determine  what  fraction  of  the  reachable  set  a  tree  T  has 
explored.  However,  because  a  tree  node  xk  is  a  point  it  has  measure  zero  and, 
therefore,  a  collection  of  tree  nodes  can  never  fill  the  state  space.  Instead,  define 
the  set  of  states  that  could  be  reached  from  a  node  xk(tk)  in  a  single  iteration 
with  time  step  At  as 


R(xk,tk,  At) 


ftk+At 

{x  G  X\3u  €  U,  x  =  xk  +  /  f(x,u)dt}. 

Jtk 


(1) 


Note  that  this  set  has  a  non-zero  measure,  allowing  us  to  discuss  its  volume 
which  can  be  related  to  the  fraction  of  the  space  that  can  be  explored  from  that 
node.  This  notation  has  many  uses,  as  illustrated  in  Figure  1. 

•  The  set  of  possible  locations  for  xnew  in  a  single  iteration  is  R(xnear ,  tnear ,  At) 
where  xnear  is  selected  to  minimize  some  metric  p(xnear ,  xrand) . 

•  The  reachable  set ,  R(x°,  t°,  AT),  where  AT  =  t^—t°  is  the  set  of  all  states 
that  can  possibly  be  explored  from  initial  condition  x°  within  the  given 
time  interval  T  =  [t°,t^]. 

•  The  explored  set  is  defined  by  R{T~kj  At)  =  Ufc^o1  R(xk,tk,  At).  It  is  the 
set  of  states  that  the  tree  can  possibly  explore  in  the  next  iteration,  from 
any  of  the  K  nodes  in  a  tree.  If  a  state  within  the  specification  set  S  (or 
goal)  is  located  within  R(7k,  At),  the  algorithm  can  terminate. 

Using  this  notation  we  define  the  explored  volume  fraction  of  a  tree  with  K 
nodes  as 

n  _  At)) 

K  p(R(x0,t0,AT))’  1  ’ 

the  ratio  of  the  volume  of  the  explored  set  to  that  of  the  reachable  set,  where 
p(*)  is  the  measure  (volume)  of  the  set. 

When  a  new  node,  xK  is  added  to  the  tree  Tr-  we  can  define  the  change  in 
the  explored  volume  fraction  as  the  growth  of  the  explored  volume  fraction 


9k+ i  — 


p(R(xK,tK,At)-R(TK,At)) 

Mi?(z°,f°,AT)) 


(3) 
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Figure  1:  An  illustration  of  R(x°,  t°,  AT),  R(xnear ,  tnear ,  At)  and  R(TK,At) 


Since,  with  the  exception  of  x°,  the  location  of  each  new  node  is  a  random 
variable,  it  is  meaningful  to  discuss  the  change  in  the  expected  value  of  the 
explored  volume  fraction,  C  or  the  growth  of  the  explored  volume  fraction  g : 

K+l 

Ck+i  =  Ck  +  9k+ i  =  Ci  +  ^  gk  (4) 

k= 2 


In  order  to  understand  the  dynamics  of  C,  we  need  to  know  how  gx+i  varies 
with  xrand  (illustrated  in  Figure  2).  To  that  end  we  partition  the  state  space 
into  three  regions.  Let  int(Tx)  be  the  set  of  all  nodes  that  do  not  support  the 
boundary  of  R(Tk ,  At). 

1.  Interior.  xrand  £  U keint(TK)  R{xk >  tk,  At).  In  this  case  xrand  is  selected  in 
the  interior  of  previously  explored  area.  No  new  growth  results  {gx+i  = 
gint  =  0). 

2.  Exterior:  xrand  £  R(TK,At).  In  this  case  xrand  is  selected  outside  of  the 
previously  explored  region  and  gK+ 1  =  gext  £  [0,gmax],  depending  on  the 
accuracy  of  the  metric  in  selecting  the  xnear  with  the  most  potential  for 
extension. 

3.  Boundary:  xrand  £  R(Tx,At)  —  U keint(TK)R(xk ,tk ,At).  This  case  rep¬ 
resents  xrand  in  a  boundary  layer  of  the  previously  explored  region.  As 
in  the  previous  case  gx+i  G  [0,gmax],  except  the  value  depends  on  the 
location  of  xrand  as  well  as  xnear. 

It  is  difficult  the  extent  to  which  boundary  effects  mentioned  in  Case  3  affect  the 
value  of  gx+i  because  it  is  a  function  of  the  current  tree  shape.  In  the  interest 


Figure  2:  Three  possible  cases  for  gx+i  depending  on  the  region  in  which  xrand 
is  generated. 


of  streamlining  the  analysis,  let  us  assume  the  boundary  region  is  small.  Then: 

9k+ i  =  Pext  •  9ext  (5) 

where  Pext  is  the  probability  xrand  is  selected  in  the  exterior  of  the  explored 
region;  and  gext  is  the  expected  growth  in  that  case.  Therefore,  in  order  to 
improve  9k+  i  our  algorithmic  enhancements  will  be  targeted  at: 

1.  increasing  gext  by  using  a  metric,  suitable  for  systems  with  complex  dy¬ 
namics,  which  is  more  apt  in  selecting  nodes,  xnear  with  more  potential 
for  growth  (Section  4.1  and  4.2);  or 

2.  increasing  Pext  by  altering  the  probability  distribution  to  generate  xrand 
in  unexplored  regions  of  the  state  space  (Section  4.3). 

Finally,  we  need  a  method  to  experimentally  measure  the  fraction  of  the 
explored  set  to  evaluate  the  effect  of  our  enhancements.  Because  the  area  of  the 
reachable  set  is  unknown  a  priori ,  from  a  practical  point  of  view  it  is  sometimes 
useful  to  define  explored  volume  fraction  with  respect  to  the  entire  set  X  rather 
than  the  reachable  set  in  X. 

nx  _  v{R{Tk,  At))  f  ^ 

°K  ~  rtx)  ■  (6) 

Even  still,  measuring  R(xk,tk,At )  in  high  dimensions  and  accounting  for  the 
overlaps  in  computing  the  union  is  frequently  impossible.  For  complex  dynamic 
systems,  the  volume  of  the  explored  set  is  often  difficult  to  calculate.  Typically 
dispersion  is  used  [39]  which  is  loosely  defined  as  the  radius  of  the  largest  ball 
in  X  which  does  not  contain  a  tree  node.  Unfortunately,  it  is  difficult  to  com¬ 
pute  and  because,  by  only  focusing  on  the  largest  such  ball,  it  yields  an  overly 
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R(x°,t°,AT ) 


Figure  3:  The  white  triangle  is  R(x°,  t°,  AT)  -  the  total  reachable  set.  The  light 
shaded  regions  constitute  R{Tk ,  At).  Nodes  selected  for  extension  on  the  basis 
of  their  distance  from  xrand  (such  as  xl)  may  be  not  be  ideal  to  grow  R(Tk,  At) 
when  the  system  is  not  small  time  locally  controllable.  x°  is  a  better  candidate 
for  extension. 


conservative  estimate  of  explored  set.  Instead,  we  introduce  a  dispersion-based 
explored  fraction.  Given  a  tree  Tk,  a  set  of  grid  points  G  C  X  with  spacing  5x, 
and  some  suitable  metric  p 


C$(Tk,G)  =  1- 


— — —  ^2  min (p(x9,TK),Sx). 

'  '  orQtZn 


(7) 


We  use  the  dispersion-based  explored  fraction  for  the  purposes  of  monitoring 
tree  growth  for  complex  dynamic  systems. 


4  Enhancements  to  the  RRT  Algorithm 

In  this  section,  we  propose  three  modifications  to  the  original  RRT  algorithm, 
all  designed  to  improve  the  expected  value  of  the  growth  of  the  explored  volume 
fraction  gx+i-  Referring  to  Equation  (5),  the  first  two  enhancements  (Sec¬ 
tion  4.1  and  4.2)  modify  the  metric  used  to  select  an  xnear  in  an  attempt  to 
improve  the  growth  per  iteration  when  xrand  is  outside  the  explored  region, 
gext.  This  is  especially  challenging  for  systems  with  complex  dynamics,  since 
there  are  no  obvious  metrics  to  establish  temporal  proximity  relationships  and 
the  reachable  set  is  not  known  a  priori.  The  third  enhancement  (Section  4.3), 
attempts  to  increase  the  probability  xrand  is  selected  in  an  unexplored  region  of 
the  state  space,  Pext.  While  there  exist  many  distributions  that  can  accomplish 
this,  we  choose  to  bias  our  samples  near  the  un-reached  specification  (goal)  set 
when  advantageous. 
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4.1  Dynamics-based  selection  of  proximal  node 

Example  4.1  Consider  the  trivial  example 

x\  =  2,  x2  =  u,  (8) 


where  u  G  U  =  [1,2]. 


The  reachable  set  I?(x°,  f°,  AT),  which  is  normally  unknown  can  easily  be  com¬ 
puted  by  hand  in  this  case,  and  is  shown  as  the  white  region  in  Figure  3.  A 
state  xrand  is  generated  and  the  planner  must  select  the  “closest”  tree  node, 
xnear  attempt  to  connect  from.  Line  2  of  Algorithm  2  (traditional  RRT)  se¬ 
lects  xnear  <—  x1  for  extension  based  on  proximity  to  xrand,  as  determined  by  a 
distance  metric  p  that  is  implicitly  assumed  to  be  a  Euclidean  metric.  However, 
none  of  the  possible  velocity  vectors  at  that  state  are  able  to  proceed  in  the 
required  direction.  The  closest  xnew  that  can  be  generated  from  x1  is  a  state 
that  has  already  been  visited  x 2.  This  results  in  g  =  0.  Despite  the  fact  that 
p{x°,  xrand)  >  p(x1,  xrand),  x°  is  actually  more  suited  to  grow  R{Tx,At)  to¬ 
ward  xrand  because  the  possible  velocity  vectors  include  a  direction  that  moves 
toward  xrand  resulting  in  g  =  gmax.  In  addition  to  testing  problems,  this  situa¬ 
tion  arises  in  a  variety  of  robotic  applications  where  the  system  is  nonholonomic 
( e.g .,  wheeled  carts),  and  particularly  in  systems  with  unilateral  constraints  on 
velocities  {e.g.,  unmanned  aerial  vehicles).  Ideally  both  distance  and  velocity 
constraints  should  be  used  to  estimate  a  “time  to  go”. 

Since,  a  node,  xnear  can  successfully  connect  to  xrand  €  R{xnear ,  tnear ,  At) 
it  is  useful  to  estimate  the  time  required  to  go  from  a  possible  xnear  to  xrand 
as  compared  with  At.  Therefore,  we  propose  replacing  p{ x^ ,xrand)  in  Line  2  of 
Algorithm  2  with  a  local  first  order  approximation  of  the  “time-to-go” 


t’2 go 


rand 


)  = 


p{xi ,  xrand) / v  if  >  0 
oo  if  v  <  0 


(9) 


where  v  represents  the  instantaneous  speed  with  which  xrand  can  be  approached 

dp(x,  xrand) 


v  =  max 

USE/ 


dx 


f  {x,u)  \x—xi 


Intuitively  t2go  computes  the  distance  from  x J  to  xrand  and  divides  by  a  first 
order  approximation  of  the  speed  with  which  the  distance  can  be  decreased, 
giving  t2g0  units  of  time.  Note  that  a  negative  value  of  v  implies  that  the 
distance  is  actually  increasing,  which  can  be  interpreted  as  infinite  “time-to-go” 
(to  first  order).  In  a  given  iteration  if  none  of  the  existing  nodes  have  a  finite 
value  for  t2go ,  one  can  be  chosen  at  random  or  based  on  some  secondary  criteria 
(such  as  distance  as  determined  by  p). 

From  a  computational  point  of  view,  the  maximization  may  be  done  by  ex¬ 
haustive  search  or  by  exploiting  some  problem  dependent  feature.  For  example 
if  f(x,u)  is  an  affine  function  of  u  and  the  set  U  is  the  Cartesian  product  of 
rectangles,  the  maximization  is  a  linear  program  in  n  dimensions  which  can  be 


11 


Figure  4:  The  system  dynamics  of  the  thermostat. 


Figure  5:  The  solution  of  the  thermostat  counter  example  via  the  RRT  using 
the  dynamics-based  selection  of  proximal  node  (temperature  vs.  time). 


solved  efficiently.  If  no  efficient  methods  exist  to  compute  this  quantity,  evalu¬ 
ating  every  node  via  this  method  can  be  intensive.  In  such  a  case,  t.2go  can  be 
used  as  a  secondary  criterion  to  select  xnear  among  the,  for  example,  10  closest 
nodes  according  to  the  Euclidean  metric. 

We  next  consider  an  example  that  is  from  the  verification  community.  Al¬ 
though  it  is  not  central  to  robotics,  it  has  many  of  the  properties  that  are  central 
to  multi-agent  robotic  systems. 

Example  4.2  The  hybrid  automata  model  of  a  thermostat  has  been  a  popular 
example  in  the  verification  literature  [2f].  Figure  4  shows  the  system  model, 
x  =  (x\,X2,X3)  £  X  C  R3  where  x±  is  the  temperature  in  the  room,  X2  is  the 
elapsed  time,  and  x%  is  a  timer  that  measures  the  cumulative  amount  of  time  the 
heater  has  been  on  for.  The  dynamics  have  two  modes  which  denote  the  heater 
being  “on”  or  “off”.  U  consists  ofuon  =  [2,4];  andu0ff  =  [-3,-1].  The  values 
uon  and  u0f  /  represent  the  possible  heating  and  cooling  rates  in  the  two  modes. 
The  conditions  x\  <  1  and  X\  >  3  enable  the  mode  switches  off  — ►  on  and 
on  — >  off  respectively.  In  [24]  a  symbolic  verification  tool  is  used  to  answer  the 
question:  “After  an  initialization  period  of  two  minutes,  is  it  possible  for  the 
heater  to  be  on  for  more  than  two  thirds  of  the  total  time  at  any  point  during 
the  first  hour  of  operation?”  Such  a  question  may  be  important  from  an  energy 
consumption  point  of  view.  The  specification  set  is 

S  =  {x  £  X\2/2)X2  -  13  <  0  A  — X2  +  2  <  0}. 

The  initial  conditions  were  mode  =  “on”,  and  x°  =  [2  0  0]T. 
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Metric 

No.  of 
Nodes 

Computation 
Time  (sec) 

Euclidean 

^2  go 

2284 

1627 

376.4 

231 

Table  1:  Thermostat  Example:  A  comparison  of  the  use  of  the  Euclidean  metric 
and  t2go  introduced  in  Section  4.1,  averaged  over  10  trials  on  a  1GHz  PC. 


Aside  from  being  a  classical  verification  example,  the  scenario  is  interesting  in 
its  own  right.  First,  the  system  has  quite  nontrivial  dynamics,  since  the  control 
inputs  do  not  appear  in  the  right  hand  side  of  two  of  the  state  equations,  or 
the  specification  equations.  This,  together  with  the  narrow  range  of  U,  makes 
the  reachable  set  a  small  subset  of  X.  The  set  of  possible  velocity  vectors  at 
every  point  is  very  limited  making  this  an  ideal  example  to  demonstrate  the 
Dynamics-based  selection  of  proximal  node. 

First  the  problem  was  solved  10  times  selecting  proximal  nodes  based  on  the 
Euclidean  metric  p;  then  10  times  with  the  Dynamics-based  selection  function 
t2go-  In  all  cases,  the  algorithm  successfully  computed  a  counter  example  as 
seen  in  Figure  5.  Table  1  shows  the  computational  statistics  for  two  algorithms. 

4.2  History-based  selection  of  proximal  node 

A  second  situation  is  shown  in  Figure  6  where  the  traditional  RRT  is  applied  to 
the  system  and,  after  3  iterations,  the  resulting  tree  is  shown  using  dark  circles 
and  dashed  line  segments.  The  reachable  set  is  the  white  region.  Because  the 
reachable  set  is  so  small,  nodes  on  the  boundary  such  as  ar,a;  ,and  x1  will 
tend  to  maintain  disproportionately  large  Voronoi  regions  causing  them  to  be 
repeatedly  selected  as  xnear.  Initially,  the  boundary  region  is  explored  quickly. 
But  then,  as  in  the  figure,  when  xrand  0  R(x°,  t°,  AT),  then  xnear  <—  xl  and 
xnew  _  x2  gu£  since  xnew  jg  a]ready  a  node  in  the  tree,  the  algorithm  fails  to 
extend  the  volume  of  the  explored  set  R(Tk,  A t)  ( i.e .  g  =  0).  Repeated  failure 
to  extend  the  boundary  nodes  prevents  the  interior  nodes  from  extending  within 
the  reachable  set. 

To  counter  balance  this  Voronoi  bias,  we  propose  a  weighting  to  prevent 
the  repeated  failure  of  extension.  If  a  node  is  selected  as  xnear  in  Line  2  of 
Algorithm  2  and  the  minimization  in  Line  3  produces  an  input  unew  which  has 
been  applied  previously,  the  resulting  xnew  is  already  an  element  of  T.  When 
this  happens  we  say  the  node  has  “failed  to  extend” ;  and  determine  the  next 
best  unew  which  extends  explored  region  (suggested  in  [13]).  For  each  x3  G  T  we 
propose  storing  the  number  of  times  the  node  has  failed  to  extend  n 3 .  This  value 
can  be  used  to  compute  a  penalty  weight  to  discourage  the  repeated  selection 
of  boundary  nodes  which  fail  to  extend.  Let  ?rmin  and  nmax  be  the  least  and 
greatest  values  of  n3  in  the  tree  at  a  given  iteration.  The  History-based  weighting 
is  defined  as 
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R(x°,t°,AT) 


Figure  6:  The  reachable  space  is  shown  as  the  white  triangular  region,  and  bold 
circles  and  dashed  lines  are  the  RRT.  Nodes  on  the  boundary  of  the  reachable 
space  have  disproportionately  large  Voronoi  regions,  causing  them  to  repeatedly 
be  selected  as  xnear. 


H(x\xrand-p)  =  p(-Xl:Xrand)  Pmin  +  U°  nmin  (10) 

Pmax  Pmin  ^max  ^min 

where  pm in  =  min xieq-  p(xl ,xrand)  and  pmax  is  defined  in  a  similar  manner. 
These  bounds  are  used  to  normalize  the  distances  so  that  the  impact  of  the  sec¬ 
ond  term  is  not  problem  dependent.  Note  that  any  distance  function,  including 
t2go  can  be  substituted  for  p. 

Example  4.3  The  RRT  algorithm  is  used  to  find  trajectories  of  the  linear  dy¬ 
namic  system  with  bounded  control  inputs  in  the  form  of 

x  =  Ax  +  Bu  +  b  (11) 

where  xGX  =  [-200,  200]  x  [-200,  200]  andu&U  =  [-10,  10]  x  [-10,  10]. 

Figure  7  shows  trajectories  generated  by  the  RRT  algorithm  using  the  Euclidean 
metric  (left)  and  using  the  History-based  weighting  described  above  (right). 
Note  that  reachable  set  is  small  fraction  of  the  environment.  The  interior  of 
the  reachable  region  with  the  History-based  selection  of  proximal  node  method 
is  much  more  densely  covered  than  when  using  the  Euclidean  metric.  Figure  8 
shows  nJ  for  each  node  in  T.  Nodes  are  sorted  in  descending  order  to  facilitate 
visualization.  In  the  conventional  RRT  algorithm,  a  smaller  portion  of  nodes 
have  disproportionately  high  values  of  nJ  (ones  on  the  boundary  of  the  reachable 
set). 

4.3  Adaptively  biased  sample  generation 

Both  previous  enhancements  sought  to  maximize  the  growth  of  the  explored 
region  by  increasing  g  when  xrand  is  generated  in  the  unexplored  region.  But 
Equation  (5)  indicates  another  possibility  is  to  select  a  probability  distribution 
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Figure  7:  RRT  for  a  linear  system  using  the  Euclidean  metric  (left)  vs.  a  History- 
based  selection  of  proximal  node  (right).  After  5000  nodes  the  explored  region 
of  the  reachable  space  is  much  more  densely  covered  when  using  the  weighting. 


0  1000  2000  3000  4000  5000 

Nodes  sorted  in  descending  order 


0  1000  2000  3000  4000  5000 

Nodes  sorted  in  descending  order 


Figure  8:  Value  of  for  each  node  (sorted  in  descending  order)  using  the 
unweighted  Euclidean  metric  (left)  and  History-based  weighting  (right). 
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R(TK,At) 


Figure  9:  The  reachable  space  is  shown  as  the  white  triangular  region,  bold 
circles  and  dashed  lines  are  the  RRT.  The  algorithm  terminates  when  xnew  lies 
in  the  specification  set  S. 


that  would  increase  Pext  -  the  probability  a  node  is  selected  in  the  exterior  of 
the  explored  set  X  —  R(Tx,At).  However  for  complex  systems  there  are  two 
main  challenges  to  this:  (1)  computing  the  explored  set  online  is  difficult;  and 
(2)  not  all  areas  of  the  unexplored  set  are  reachable. 

Issue  1  can  be  overcome  by  realizing  that,  until  the  algorithm  terminates, 
the  specification  (or  goal)  set  S  lies  in  the  unexplored  region  by  definition  and 
that  S  has  a  convenient  representation.  Since  we  are  seeking  a  trajectory  to 
connect  x°  to  S  it  makes  sense  to  bias  our  sampling  distribution  inside  S, 
as  indicated  in  Figure  9.  Assuming  we  follow  this  strategy,  Issue  2  presents  a 
greater  challenge.  Complex  system  dynamics  can  render  S  unreachable  in  which 
case  the  problem  does  not  have  a  solution.  However,  a  potential  danger  is  that 
S  is  reachable  but  the  tree  fails  to  reach  it  using  a  biased  sampling  approach 
due  to  a  non-convex  state  space  or  a  lack  of  small  time  local  controllability. 
Adaptive  biasing  circumvents  this  problem.  Intuitively,  biasing  the  sampling 
distribution  for  xrand  to  generate  a  disproportionate  number  of  samples  inside 
the  set  S  is  effective  when  the  system  is  easily  steered  toward  S  (i.e.  the  system 
is  output  controllable  with  respect  to  s(x)).  To  estimate  this  we  update  the 
amount  of  biasing  for  every  Ns  iterations  of  the  RRT  algorithm,  where  Ns  is 
user  defined  number.  If  in  a  given  iteration  p(xnear ,  xrand)  >  p(xnew,  xrand), 
where  p  is  a  metric  function,  we  call  such  an  iteration  successful  because  the  tree 
has  grown  toward  xrand.  We  count  the  number  of  successful  iterations  ns,  out 
of  the  np  iterations  where  random  states  were  generated  inside  the  set  defined 
by  s(x)  <  0  and  compute 

/?  =  '—•  (12) 

np 

Values  of  (3  close  to  unity  indicate  biasing  sample  generation  inside  S  has  been 
beneficial. 

Our  proposed  probability  density  function  B(x;p,fi),  to  be  used  in  Line  1 
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Figure  10:  The  distribution  B{x\  p,  (3)  with  p  =  0  and  various  values  of  (3. 


of  Algorithm  2,  resembles  a  Gaussian  over  some  compact  set,  a  <  x  <  b 


B(x;p,/3) 


N(x ;  /X,  <j{(3))  +  Ct/{b  —  a),  a  <  x  <  b 
0  otherwise 


(13) 


where  N(x;  p,cr(/3))  is  the  Gaussian  distribution  with  mean  p  and  standard 
deviation  o((3).  The  last  term,  Ct/{b  —  a),  is  added  to  ensure  that  the  area 
under  the  curve  is  equal  to  one.  Ct  represents  the  area  of  the  truncated  portions 
above  x  =  b  and  below  x  =  a 

/a  roc 

N(x;  p,a)dx  +  /  N(x;  jx,a)dx. 

-oo  J  b 

Obviously  \x  should  be  selected  so  that  s(/r)  <  0.  The  standard  deviation 
of  N(x;  n,  <r(/3))  effectively  determines  the  bias  and  should  be  computed  using 
(3  £  [0, 1] 

^(Z^)  =  (1  /5)(crmax  crmin)  +  crmin,  (14) 

where  crmax  and  crmin  are  user-defined  values  of  the  maximum  and  minimum 
standard  deviations. 

Figure  10  illustrates  the  shape  of  B(x;n,(3)  with  different  values  of  (3.  Dis¬ 
tribution  (13)  can  be  easily  implemented  using  any  random  normal  generator 
and  rejecting  points  generated  outside  the  compact  domain. 


Example  4.4  We  consider  a  hovercraft  in  constant  altitude  flight  with  6  states, 
x  =  (xi,X2,0,vi,V2,w).  The  dynamic  equations  are 

mi)  1  =  (/i  +  /2)  COS (9)  +  fXlair(x,  Vair{x )) 

mi>2  =  (fl  +  f2)sin(9)  +  fx2air(x,Vair{x)) 

Jd)  —  (J*2  fi)l  +  Tair{x,Vair{xfl 

The  control  inputs  are  u  =  [fi  /2]T  (forward  actuating  forces)  and  U  =  [—10, 10]  x 
[—10,10].  Forces  due  to  wind  disturbances  in  the  x\,  X2  and  9  directions  are 
fxtair,  fx2air,  ond  Tair  whose  exact  expressions  are  omitted  for  brevity  but  are 
quadratic  in  the  difference  between  the  craft 's  velocity  and  the  wind  velocity  vair 
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Figure  11:  RRTs  of  the  hovercraft  problem  with  uniform  sampling  (left)  and 
with  adaptive  bias  (right). 


Figure  12:  The  evolution  of  the  biasing  factor  /3  for  the  hovercraft  problem. 


and  vary  with  the  orientation  of  the  craft.  Note  that  the  state  is  partitioned  into 
two  regions  (indoor  and  outdoor)  which  define  the  wind  velocity  differently: 

_  /  \-avx2  /3vx i]T,  ^/(xi)2  +  (x2)2  <  100 
Vair  \  [o  OF,  V(xi)2  +  (X2)2  >  100  ' 

We  would  like  to  determine  if  a  hovercraft  under  these  wind  conditions  can 
reach  some  goal  zone,  S  =  {(x\,x2)  £  [190,200]  x  [0,10]}.  Note  that  when 
outdoors,  the  wind  forces  are  significantly  greater  in  magnitude  than  the  control 
inputs,  making  the  system  uncontrollable.  The  initial  state  is  x°  =  [0  0  0  0  0  0]T. 

The  distribution  (13)  was  used  to  solve  the  problem  10  times  on  a  1GHz  PC. 
Figure  11  shows  the  solutions  of  the  problem  with  the  uniform  sampling  dis¬ 
tribution  and  adaptive  bias.  Figure  12  shows  how  /3  changes  as  the  algorithm 
evolves.  The  adaptive  algorithm  is  able  to  exploit  the  situations  in  which  bias¬ 
ing  is  effective.  As  shown  in  Table  2,  the  adaptive  biasing  algorithm  improves 
the  efficiency  of  RRT  method  compared  to  other  fixed  bias  strategies  rather 
dramatically. 
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Sampling  Method 

No.  of  Nodes 

Computation  Time  (sec) 

Uniform 

3556 

1753.5 

Med.  Bias 

1017 

490.2 

Heavy  Bias 

912 

408.3 

Adaptive  Bias 

678 

342.5 

Table  2:  Hovercraft  Example:  A  comparison  of  the  sampling  strategy  introduced 
here  (Adaptive  Bias)  to  fixed-bias  sampling  strategies,  averaged  over  10  trials 
on  a  1GHz  PC. 

5  Unified  Algorithm 

In  this  section  we  introduce  a  unified  algorithm  incorporating  the  three  enhance¬ 
ments  presented  earlier  and  validate  the  new  algorithm  on  a  large  scale  example 
problem. 

5.1  Unified  algorithm 

Algorithms  3  and  4  present  the  unification  of  the  enhancements  presented  in 
the  previous  section.  Note  that,  since  most  robotic  problems  are  controllable, 
Algorithm  1  can  terminate  when  a  solution  is  found.  In  our  case,  it  is  a  dis¬ 
tinct  possibility  that  no  solution  exists  so  we  impose  a  secondary  termination 
criterion.  We  can  use  the  dispersion-based  explored  fraction  defined  in  Sec¬ 
tion  3  to  monitor  tree  growth.  The  change  in  over  the  trailing  N  iterations 
A  C$  ,  measures  the  growth  of  the  tree.  If  AC^  drops  below  some  user-defined 
AC£min  we  terminate  the  search.  Before  running  simulations,  the  user  needs 
to  set  the  user-defined  values  such  as  U,  ACj£  ;  ,  N ,  Sx,  Ns,crm.in  and  <Jmax  and 
initialize  (3  =  1. 


Algorithm  3  Generate  enhanced-RRT:  T 

Initialize  RRT:  T.addVertex(a;0  <— ■  xlmt,n°  0) 

Global:  f3  =  1 

while  (fix  €  T  such  that  s(x )  <  0)  AND  A C%  >  A C^min  do 
enhanced-  Extend(T) 

end  while 


5.2  A  Multiagent  Problem 

We  consider  a  problem  where  multiple  autonomous  vehicles  must  guard  against 
an  intruder  entering  a  designated  area.  This  scenario  has  applications  in  games 
such  as  “capture  the  flag” .  It  has  applications  in  homeland  security  where 
autonomous  vehicles  (boats,  airplanes,  ground  robots)  can  be  deployed  to  detect 
unidentified  vehicles  entering  a  cordoned-off  area  or  an  exclusion  zone. 

In  this  example,  we  examine  a  circular  area,  Se  guarded  by  4  robots.  Each 
robot  has  sensor  foot  prints  which  are  assumed  to  be  circular  with  radius  Rd 
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Algorithm  4  enhanced-Extend(T) 

&  —  (1  ^)(^ma x  ^min)  T  f^min 

xrand  g  x  <—  B{x\n,f3)  (see  Equation  (13)) 

j.near  arg minxjer  [H (at* ,  xrand;  t2go)]  (see  Equation  (9), (10)) 

m"6™  =  arg  minueu  [t2flo  ( xnear ,  a:™717*)] 

j.neu,  =  xnear  +  JA*  /( X,Unew{t))dt 

if  a;7167"  =  e  T  then 


while  a:™6"’  =  iJ  £  T  do 
U  *-U  -  unew 

unew  _  argmin„ep[f2so(a:7 

a;7161"  =  a:near  +  fAt  f(x,  u 

end  while 
end  if 

T.addVertex^7767",  nnew  =  0) 
T.  addEdge  (unew,  xnear  ->■  a:716 
reset  U 

if  As  iterations  then 

.1  =  7‘* 

^  »/3 

end  if 


near  ^ rand\ 


for  detection  and  Rc  for  capture,  as  shown  in  Figure  13. We  assume  each  of  the 
intruder  and  guard  robots  has  5  states,  xl  =  (x\,  x\,  0l ,  vl,  u>1)  and  2  control 
inputs,  ul  =  (u\,ul2)  where  x1  and  u 1  indicate  states  and  input  of  the  intruder. 
The  dynamics  with  nonholonomic  constraints  are  given  by: 

x\  =  v‘cos(0‘),  x\  =  vlsin(6z),  0i  =  , 

Vi  =  u\  ,  w’  =  Un. 


We  can  define  the  free  space  A  =  A1  xX2x-  ■  -xA5\IJ^=2  B(xl{t),  Rc)  C 


where 


A7  =  {(xlxle^v*,^)  e  k5|(4)2  +  (4)2  <  Rft 

B(xl(t),Rc)  =  {(x\,xl)\(x\  -  x\)2  +  {xl  -  xl)2  <  R2c }. 

Then  the  specification  set  S  is  defined  by 

S={GX\{x\)2  +  {xl)2<R2s} 

where  Rs  is  the  radius  of  the  circle  Ce- 

The  guarding  scheme  is  shown  in  Figure  14.  Initially,  the  guard  robots 
distribute  evenly  along  the  perimeter  of  the  exclusion  zone.  If  the  intruder 
enters  the  detection  range  of  a  guard  robot,  the  robot  pursues  the  intruder  and 
other  robots  redistribute  evenly  along  the  circle  Ce-  If  the  intruder  escapes 
the  detection  range  of  the  pursuing  robot,  the  robot  returns  to  the  perimeter 
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and  all  robots  redistribute  evenly.  To  evenly  distribute  guard  robots  along  the 
perimeter,  we  use  the  algorithm  proposed  in  [11].  Each  guard  robot  is  subject 
to  the  force  __ 

=  —  /cV'02(<7j)  —  cq>  +  ^  Fr{q\qk)  (16) 

fceiVj 

where  q ?  =  (ar},^)  £  R2  is  the  position  of  robot  j ,  ip  :  M.2  M.  is  an  implicit 
function  description  of  the  perimeter  of  the  exclusion  zone  that  must  be  guarded 
and  Nj  is  the  set  of  robots  neighboring  robot  j.  Fr  is  a  Coulomb-like  repulsive 
force  that  ensures  that  the  robots  do  not  cluster  together,  while  c  is  a  constant 
which  provides  a  viscous  damping  term.  The  force  is  applied  to  a  point  that  is 
at  a  finite  distance  away  from  a  robot  to  address  nonholonomic  constraints.  A 
detailed  description  of  the  control  law  including  a  proof  of  convergence  to  dif¬ 
ferent  shapes  is  provided  in  [11].  However,  the  analysis  in  that  paper  cannot  be 
used  to  predict  the  transients  as  each  guard  robot  moves  toward  the  perimeter. 

The  question  we  wish  to  answer  is  as  follows.  If  an  intruder  or  an  adversary 
is  allowed  to  start  anywhere  in  a  specified  region  57,  and  the  guard  robots, 
employing  the  control  law  described  above,  are  initially  evenly  distributed  on 
the  circle  Ce,  can  the  intruder  enter  the  exclusion  zone  ( Se )  uncaptured?  The 
answer  to  our  question  can  only  be  found  by  searching  for  an  initial  condition 
and  a  control  input  function  for  the  intruder  which  drives  it  into  the  exclusion 
zone  without  crossing  any  of  the  capture  ranges.  Note  that  the  reachable  set  of 
states  in  X  is  a  small  subset  of  the  entire  state  due  to  the  fact  that  the  system 
is  uncontrollable  and  U  is  bounded.  Finally,  note  that  the  intruder  can  start 
anywhere  in  the  set  5/.  In  other  words,  the  initial  condition  for  the  intruder 
must  be  chosen  from  this  finite  set,  each  condition  leading  to  a  RRT. 

We  apply  the  RRFT  algorithm  [21]  with  enhancements  suggested  in  Sec¬ 
tion  5.1  to  the  problem.  The  control  inputs  are  u  =  (u\,ul)  £  U  =  [—5,  5]  x 
[— 7r/15,  7t/15]  with  Ri  =  300 m,  Rs  =  100m,  Rd  =  110m  and  Rc  =  40m. 
Figure  15  shows  the  forest  of  trees  where  a  solution  trajectory  is  found  for  the 
algorithm  with  the  “history-based  selection  of  proximal  node” ,  visualizing  the 
position  of  the  intruder.  Eight  initial  conditions  are  generated  and  a  forest 
starts  to  grow  until  a  solution  is  found.  Table  3  shows  the  statistics  obtained 
for  this  example  with  all  the  enhancements.  The  second  column  shows  the  av¬ 
erage  number  of  nodes  used  to  find  a  solution  trajectory  for  the  intruder  robot 
(one  such  trajectory  is  shown  in  Figure  15).  The  third  column  shows  the  com¬ 
putation  time  with  different  options.  The  first  main  point  to  note  from  these 
two  columns  is  that  the  standard  algorithm  takes  eight  times  as  long  requiring 
eight  times  the  number  of  nodes  to  find  the  same  solution.  The  second  main 
point  in  this  example  is  that  adaptive  biasing  allows  the  most  improvement  in 
efficiency  among  the  three  enhancements.  This  is  because,  while  the  probability 
distribution  used  does  increase  Pext ,  the  primary  motivation  in  selecting  it  was 
to  grow  the  tree  toward  the  goal,  as  fast  as  possible.  Figure  16  shows  the  change 
of  (3  as  the  algorithm  with  “adaptively  biased  sample  generation”  evolves. 

Figure  17  shows  the  comparison  of  the  dispersion-based  explored  fraction 
of  the  original  algorithm  compared  with  the  history-based  and  dynamics-based 
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Figure  13:  Initial  conditions  for  guard  robots  and  intruder.  Each  robot  has  a 
detection  range  Rd  within  which  the  intruder  is  detected,  and  capture  range  Rc 
within  which  the  intruder  is  captured. 


Figure  14:  Guarding  scheme  of  the  robots.  Distribute  until  the  intruder  is 
detected  (left);  and  pursue  if  the  intruder  is  within  the  detection  range  of  a 
guard  robot  (right). 


selection  of  proximal  node.  As  expected  the  dynamics-based  and  history-based 
algorithms  provide  better  tree  growth. 


Enhancement 

Method 

No.  of 
Nodes 

Computation 
Time  (sec) 

No  Enhancement 

25251 

14978.8 

Dynamics-based 

14944 

7549.4 

History-based 

12896 

6387.7 

Adaptively  biased 

4924 

3396.9 

Three  Enhancements 

3352 

1976.3 

Table  3:  Guard-intruder  Example:  A  comparison  of  the  algorithm,  with  and 
without  enhancements  averaged  over  10  trials  on  a  3GHz  PC. 


6  Conclusion 

The  RRT  algorithm  has  been  successful  in  solving  complex  motion  planning 
problems.  We  explore  the  application  of  this  algorithm  and  its  variants  to  the 
problem  of  testing  complex  reactive  control  systems  and  validating  the  effec- 
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Figure  15:  The  forest  of  RRTs  with  8  different  initial  conditions.  A  solution 
trajectory  of  the  intruder  is  highlighted  on  the  bottom. 


tiveness  of  multi-agent  controllers.  Testing  and  validation  involve  searching  for 
conditions  that  lead  to  system  failure  by  exploring  all  adversarial  inputs  and 
disturbances  for  errant  trajectories.  Unlike  motion  planning  problems,  the  sys¬ 
tems  may  not  be  controllable  with  respect  to  disturbances  or  adversarial  inputs 
and  the  reachable  set  of  states  is  generally  a  small  subset  of  the  entire  state 
space. 

Because  of  the  differences  between  testing  and  motion  planning,  we  propose 
three  modifications  to  the  original  RRT  algorithm.  First,  we  develop  a  new 
distance  function  which  encodes  local  information  about  the  system’s  dynamics 
with  a  first  order  approximation.  Second,  because  the  reachable  state  space  is 
generally  a  small  fraction  of  the  total  state  space,  we  modify  the  node  selection 
strategy  to  discourage  the  repeated  selection  of  boundary  nodes.  Finally,  we 
propose  a  scheme  for  adaptively  modifying  the  sampling  probability  distribu¬ 
tion  based  on  tree  growth.  We  demonstrate  the  application  of  the  algorithm  via 
three  simple  examples  and  one  large  scale  (25  dimensions)  multi-agent  pursuit- 
evasion  example  problem  and  provide  computational  statistics  demonstrating 
a  reduction  of  computation  time  by  a  factor  of  eight.  To  demonstrate  im¬ 
provements  in  the  rate  at  which  the  state  space  is  explored  with  the  dynamics- 
based  and  history-based  selection  of  proximal  node  algorithms,  we  derived  a 
dispersion-based  measure  of  the  explored  set.  Growth  rates  of  the  complex  dy¬ 
namic  systems  are  qualitatively  compared  with  and  without  enhancements  of 
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Figure  16:  The  evolution  of  the  biasing  factor  (3  for  the  guard-intruder  example. 


Figure  17:  A  comparison  of  the  dispersion-based  explored  fraction. 


the  algorithm  and  the  result  shows  distinct  improvements. 

Note  that  the  enhancements  introduced  here  are  targeted  at  systems  with 
complex  (non-linear,  non-holonomic,  or  non-smooth)  dynamics  which  cause 
large  portions  of  the  state  space  to  be  unreachable.  In  contrast,  motion  plan¬ 
ning  problems  with  simple  dynamics  (holonomic)  but  geometrically  complex 
spaces  (e.g.  narrow  passages),  pose  a  different  set  of  challenges.  The  motiva¬ 
tion  for  the  History-based  and  Dynamics-based  selection  techniques  here  is  an 
underlying  assumption  that  the  connection  attempt  (i.e.  generating  xnew)  is 
(1)  computationally  expensive  and  (2)  unlikely  to  actually  reach  xrand  due  to 
small  At.  Therefore  metrics  are  used  as  less  than  ideal  proxies  to  guide  the 
selection  of  which  node  to  expend.  In  contrast,  for  systems  with  simple  dynam¬ 
ics,  connecting  a  random  node  to  the  existing  search  graph  can  be  trivial  and 
can  be  possibly  be  tested  exhaustively.  A  more  daunting  issue  for  geometrically 
complex  problems  has  to  do  with  generating  a  sufficient  number  of  samples  in 
“critical”  regions  of  the  configuration  space  like  narrow  passages.  To  that  end 
the  concept  of  biasing  the  distribution  adaptively  can  be  useful. 

Based  on  our  experience  we  find  that  the  relative  impact  each  enhancement 
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has  on  algorithmic  efficiency  is  problem  dependent.  For  example,  the  history- 
based  selection  method  yields  significant  improvements  when  the  volume  of  the 
reachable  set  is  a  very  small  fraction  of  state  space  (Example  4.3),  and  the  goal 
set  lies  in  the  interior  of  the  reachable  set.  Fortunately  it  requires  little  com¬ 
putational  overhead  so,  even  when  little  is  known  about  the  system,  there  is 
little  drawback  to  using  it.  In  general,  biasing  the  sample  distribution  yields 
dramatic  improvements  in  regions  of  the  state  space  where  the  system  is  small 
time  locally  controllable  or  the  configuration  space  is  simply  connected.  For 
systems  in  which  this  is  not  true,  the  adaptive  approach  takes  slightly  longer  to 
find  a  solution  compared  with  a  uniform  distribution.  However,  by  virtue  of  be¬ 
ing  adaptive,  after  a  brief  transient  the  distribution  converges  to  near  uniform. 
Again,  little  overhead  is  required  to  implement  this  enhancement  so  there  is 
little  risk  in  including  it.  The  dynamics-based  selection  method  is  very  effective 
for  systems  that  are  not  small  time  locally  controllable;  particularly  those  with 
unilateral  constraints  on  velocities  (i.e.  a  aircraft  that  moves  forward  only). 
However,  the  computational  overhead  to  compute  Equation  (9)  can  vary  con¬ 
siderably.  If  the  input  range  is  discretized  into  Nu  values  and  an  exhaustive 
search  is  used  to  compute  the  best  value  of  u  at  all  K  nodes  in  the  tree,  it  needs 
K  ■  Nu  evaluation  of  f(x,  u)  and  inner  products  at  each  iteration.  If  the  tree  has 
many  nodes,  this  overhead  will  outweigh  the  benefits  of  the  enhancement.  An 
efficient  method  of  selecting  u  to  connect  to  a  state  is  required.  In  such  a  case, 
t2go  can  be  used  as  a  secondary  criterion  to  select  xnear  as  mentioned  in  Sec¬ 
tion  4.1.  It  has  been  our  experience  that  the  dynamics-based  selection  method 
provides  considerable  computational  improvement  even  with  the  overhead  for 
many  examples. 

Future  work  is  being  directed  to  a  systematic  approach  to  selecting  the  user- 
defined  parameters  of  the  algorithms  proposed  in  this  paper.  For  example,  as 
thresholds  for  stopping  the  growth  of  a  tree,  we  use  significantly  small  constant. 
However,  we  believe  there  exists  a  mathematically  meaningful  way  to  set  the 
threshold  on  an  acceptable  growth  rate.  Such  thresholds  will  be  of  importance 
since  they  are  directly  related  to  the  termination  criteria  for  the  algorithm. 
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